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Abstract: Increasingly, business projects are ephemeral. New Business Intelligence tools must support ad-lib data 
sources and quick perusal. Meanwhile, tag clouds are a popular community-driven visualization technique. 
Hence, we investigate tag-cloud views with support for OLAP operations such as roll-ups, slices, dices, clus- 
tering, and drill-downs. As a case study, we implemented an application where users can upload data and 
immediately navigate through its ad hoc dimensions. To support social networking, views can be easily shared 
and embedded in other Web sites. Algorithmically, our tag-cloud views are approximate range top-k queries 
over spontaneous data cubes. We present experimental evidence that iceberg cuboids provide adequate online 
approximations. We benchmark several browser-oblivious tag-cloud layout optimizations. 



1 INTRODUCTION 

The Web 2.0, or Social Web, is about making avail- 
able social software applications on the Web in an 
unrestricted manner. Enabling a wide range of dis- 
tributed individuals to collaborate on data analysis 
tasks may lead to significant productivity gains ( ,Heer| 
et al., 2Q07| [Wattenberg and Kriss, 2006] ). Sev- 



eral companies, like SocialText and IBM, are offer- 
ing Web 2.0 solutions dedicated to enterprise needs. 
The data visualization Web sites Many Eyes ( |IBM,| 
[2007 ) and Swivel ( [Swivel, Inc7^007j have become 
part of the Web 2.0 landscape: over 1 million data sets 



were uploaded to Swivel in less than 3 months ( [But- 
[ler, 2007D . 

These Web 2.0 data visualization sites use tradi- 
tional pie charts and histograms, but also tag clouds. 
Tag clouds are a form of histogram which can repre- 
sent the amplitude of over a hundred items by varying 
the font size. The use of hyperlinks makes tag clouds 
naturally interactive. Tag clouds are used by many 
Web 2.0 sites such as Flickr, del.icio.us and Techno- 
rati. Increasingly, e-Commerce sites such as Amazon 
or O'Reilly Media, are using tag clouds to help their 
users navigate through aggregated data. 

Meanwhile, OLAP (On-Line Analytical Process- 



ing) ( [Codd, 1993[ ) is a dominant paradigm in Busi- 
ness Intelligence (BI). OLAP allows domain experts 
to navigate through aggregated data in a multidimen- 
sional data model. Standard operations include drill- 



down, roll-up, dice, and slice. The data cube (Gray 



et al., 199 6) model provides well-defined semantics 
and performance optimization strategies. However, 
OLAP requires much effort from database adminis- 
trators even after the data has been cleaned, tuned 
and loaded: schemas must be designed in collabo- 
ration with users having fast changing needs and re- 
quirements ( [Body et al., 2002[ [Morzy and Wrembel,[ 
2004 ). Vendors such as Spotfire, Business Objects 
and QlikTech have reacted by proposing a new class 
of tools allowing end-user to customize their appli- 
cations and to limit the need for centralized schema 



crafting ( Havenstein, 2003[ ). 

OLAP itself has never been formally defined 
though rules have been proposed to recognize an 
OLAP application ( [Codd, 1993] ). In a similar manner, 
we propose rules to recognize Web 2.0 OLAP appli- 
cations (see also Table [TJ: 

1 . Data and schemas are provided autonomously by 
users. 

2. It is available as a Web application. 



3. It supports complete online interaction over ag- 
gregated multidimensional data. 

4. Users are encouraged to collaborate. 

Tag clouds are well suited for Web 2.0 OLAP. 
They are flexible: a tag cloud can represent a dozen 
or hundred different amplitudes. And they are acces- 
sible: the only requirement is a browser that can dis- 
play different font sizes. 

We describe a tag-cloud formalism, as an instance 
of Web 2.0 OLAP. Since we implemented a pro- 
totype, technical issues will be discussed regarding 
application design. In particular, we used iceberg 
cubes (Carey and Kossmann, 1997 ) to generate tag 
clouds online when the data and schema are provided 
extemporaneously. Because tag clouds are meant to 
convey a general impression, presenting approximate 
measures and clustering is sufficient: we propose spe- 
cific metrics to measure the quality of tag-cloud ap- 
proximations. We conclude the paper with experi- 
mental results on real and synthetic data sets. 

Table 1: Conventional OLAP versus Web 2.0 OLAP 



Conventional OLAP 


Web 2.0 OLAP 


recurring needs 
predefined schemas 
centralized design 
histograms 
plots and reports 
access control 


ephemeral projects 
spontaneous schemas 
user initiative 
tag clouds 

iframes, wikis, blogs 
social networking 



2 RELATED WORK 



There are decentralized models ( Taylor and Ives, 
2006| ) and systems ( [Green et al., 2007| ) to support col- 



laborative data sharing without a single schema. 

According to Wu et al., it is difficult to navigate 
an OLAP schema without help; they have proposed 
a keyword-driven OLAP model ( Wu et al., 2007| ). 
There are several OLAP visualization techniques in- 



cluding the Cube Presentation Model (CPM) ( |Mani- 
atis et al., 2005|), Multiple Correspo ndence Analysis 
(MCA) ([Ben Messaoud et al., 2006|) and other inter- 



active systems ( [Techapichetvanich mid Datta, 2005 ). 

Tag clouds have been popularized by the Web site 
Flickr launched in 2004. Several optimization op- 
portunities exist: similar tags can be clustered to- 
gether (Kaser and Lemire, 2007), tags can be pruned 
automatically ( Hassan-Montero and Herrero-Solana^ 
[2006) or by user interven tion (|Millen et al., 2006] )7 
tags can be indexed ( Millen et al., 2006| ), and so 
on. Tag clouds can be adapted to spatio-temporal 
data dRussell, 2006l|Jaffe et al., 2006| ). 



3 OLAP FORMALISM 



3.1 Conventional OLAP Formalism 



Most OLAP engines rely on a data cube (|Gray et al., 
1996). A data cube C contains a non empty set of d 
dimensions (D = {Z)i}i<Kj and a non empty set of 
measures 9^. Data cubes are usually derived from 
a fact table (see Table |2]) where each dimension and 
measure is a column and all rows (or facts) have dis- 
joint dimension tuples. Figure |l(a)| gives tridimen- 
sional representation of the data cube. 

Table 2: Fact table example 



Dimensions 


Measures 


location 


time 


salesman 


product 


cost 


profit 


Montreal 


March 


John 


shoe 


100$ 


10$ 


Montreal 


December 


Smith 


shoe 


150$ 


30$ 


Quebec 


December 


Smith 


dress 


175$ 


45$ 


Ontario 


April 


Kate 


dress 


90$ 


10$ 


Paris 


March 


John 


shoe 


100$ 


20$ 


Paris 


March 


Marc 


table 


120$ 


10$ 


Paris 


June 


Martin 


shoe 


120$ 


5$ 


Lyon 


April 


Claude 


dress 


90$ 


10$ 


New York 


October 


Joe 


chair 


100$ 


10$ 


New York 


May 


Joe 


chair 


90$ 


10$ 


Detroit 


April 


Jim 


dress 


90$ 


10$ 



Measures can be aggregated using several opera- 
tors such as AVERAGE, MAX, MIN, SUM, and COUNT. 
All of these measures and dimensions are typically 
prespecified in a database schema. Database adminis- 
trators preaggregate views to accelerate queries. 

The data cube supports the following operations: 

• A slice specifies that you are only interested in 
some attribute values of a given dimension. For 
example, one may want to focus on one specific 
product (see Figure l(g)| ). Similarly, a dice selects 
ranges of attribute values (see Figure [T(e)). 



• A roll-up aggregates the measures on coarser at- 
tribute values. For example, from the sales given 
for every store, a user may want to see the sales 



aggregated per country (see Figure 1(c) ). A drill- 
down is the reverse operation: from the sales per 
country, one may want to explore the sales per 
store in one country. 

The various specific multidimensional views in 
Figure [T] are called cuboids. 

3.2 Tag-Cloud OLAP Formalism 

A Web 2.0 OLAP application should be supported by 
a flexible formalism that can adapt a wide range of 




(a) OLAP data cube 
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Paris-March-Table 
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(b) Tag-cloud data cube 

New York-May-chair 

Lyon-April-shoe 
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country location Roll-up on product 




(c) OLAP roll-up 



Lyon-April 

Montreal-December 

Montreal-March 
New York-April New York-May 
New York-October Ontario-April 

Paris-March Paris-June 

Quebec-December 



(d) Tag-cloud roll-up 



country location Slice where product="shoe' 
( I New York 
Detroit 
Paris 
Lyon 
Quebec 
Ontario 
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Lyon-April 
Detroit-April New York-May 
New York-October Ontario-April 
Paris-March 

Quebec-December 



(e) OLAP dice 



(f) Tag-cloud dice 



(g) OLAP slice 



(h) Tag-cloud slice 



Figure 1: Conventional OLAP operations vs. tag-cloud OLAP operations 



data loaded by users. Processing time must be rea- 
sonable and batch processing should be avoided. 

Unlike in conventional data cubes, we do not ex- 
pect that most dimensions have explicit hierarchies 
when they are loaded: instead, users can specify how 
the data is laid out (see Section |5]). As a related issue, 
the dimensions are not orthogonal in general: there 
might be a "City" dimension as a well as "Climate 
Zone" dimension. It is up to the user to organize the 
cities per climate zone or per country. 

Definition 1 (Tag) A tag is a term or phrase de- 
scribing an object with corresponding non-negative 
weights determining its relative importance. Hence, a 
tag is made of a triplet (term, object, weight). 

As an example, a picture may have been attributed 
the tags "dog" (12 times) and "cat" (20 times). In 
a Business Intelligence context, a tag may describe 
the current state of a business. For example, the tags 
"USA" (16,000$) and "Canada" (8,000$) describe the 
sales of a given product by a given salesman. 

We can aggregate several attribute values, such as 
"Canada" and "March," into a single term, such as 
"Canada-March." A tag composed of k attribute val- 



ues is called a ^-tag. Figure 1(b) shows a tag cloud 
representation of Table [2] using 3-tags. 

Each tag T is represented visually using a font 
size, font color, background color, area or motif, de- 
pending on its measure values. 
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Figure 2: User-driven schema design 



3.3 Tag-Cloud Operations 

In our system, users can upload data, select a data set, 
and define a schema by choosing dimensions (see Fig- 
ure |2]). Then, users can apply various operations on 
the data using a menu bar. On the one hand, OLAP 
operations such as slice, dice, roll-up and drill-down 
generate new tag clouds and new cuboids from ex- 



isting cuboids. Figures pXd)) 1 1 (f)| and 1 1 (h)[ show the 
results of a roll-up, a dice, and a slice as tag clouds. 
On the other hand, we can apply some operations on 
an existing tag cloud: sort by either the weights or 
the terms of tags, remove some tags, remove lesser 
weighted tags, and so on. We estimate that a tag cloud 
should not have more than 150 tags. 

Tag-cloud layout has measurable benefits when 



trying to convey a general impression (Rivadeneira 
iet al., 2007| ). Hence, we wish to optimize the visual 



— Select tag-support and similarity dimensions 

You have to select at least one tag-support dimei 
Cuboid's dimensions Tag-support dirr 

location 
:ime 



clustering dinr 
*- , country 



You have selected 3 dimension(s) 

(*) The attribute values of the selected dimensions are combined to derive a tag cloud, 
(**) The selected dimensions are used to cluster the tags. 



clustered by countries. 
Without similarity 



Detroit-dress Ouebec-dress Paris-table Ontario-dress 
Montreal-shoe Lyon-dress New York-chair 



With similarity 



Detroit-dress New York-chalr Quebec-dress 

Ontario-dressMontreai-ShOeParis-tablePariS-ShoeLyon-dress 



Figure 3: Choosing similarity dimensions 



Figure 4: Tag-cloud reordering based on similarity 



arrangement of tags. Chen et al. propose the com- 
putation of similarity measures between cuboids to 
help users explore data dChen et al., 2000 1: we ap- 
ply this idea to define similarities between tags. First 
of all, users are asked to provide one or several di- 
mensions they want to use to cluster the tags. Choos- 
ing the "Country" dimension would mean that the 
user wants the tags rearranged by countries so that 
"Montreal- April" and "Toronto-March" are nearby 
(see Figure |3]). The clustering dimensions selected by 
the user together with the tag-cloud dimensions form 
a cuboid: in our example, we have the dimensions 
"Country," "City," and "Time." Since a tag contains 
a set of attribute values, it has a corresponding sub- 
cuboid defined by slicing the cuboid. 

Several similarity measures can be applied be- 
tween subcuboids: Jaccard, Euclidean distance, co- 
sine similarity, Tanimoto similarity, Pearson correla- 
tion, Hamming distance, and so on. Which similarity 
measure is best depends on the application at hand, 
so advanced users should be given a choice. Com- 
monly, similarity measures take up values in the in- 
terval [—1,1]. Similarity measures are expected to be 
reflexive (/(a, a) = 1), symmetric (/(a,b) = /(b,a)) 
and transitive: if a is similar to b, and b is similar to 
c, then a is also similar to c. 

Recall that given two vectors v and w, the co- 
sine similarity measure is defined as cos(v,w) = 

E^v^w,-/^i:,-vfi:,-w^ = v/|v| • w/\w\. The Tani- 
moto similarity is given by T.i^i'^i/illi^j + ~ 
J^j V/W;); it becomes the Jaccard similarity when the 
vectors have binary values. Both of these measures 
are reflexive, symmetric and transitive. Specifically, 
the cosine similarity is transitive by this inequality: 
cos(v,z) > cos(w,z) — vT^~cos(ivH^. To general- 
ize the formulas from vectors to cuboids, it suffices to 
replace the single summation by one summation per 
dimension. Figure |4] shows an example of tag-cloud 
reordering to cluster similar tags. In this example, the 
"City-Product" tags were compared according to the 
"Country" dimension. The result is that the tags are 



4 FAST COMPUTATION 

Because only a moderate number of tags can be dis- 
played, the computation of tag clouds is a form of 
top-^ query: given any user- specified range of cells, 
we seek the top-^ cells having the largest measures. 
There is a little hope of answering such queries in 
near constant-time with respect to the number of facts 
without an index or a buffer. Indeed, finding all 
and only the elements with frequency exceeding a 
given frequency threshold ( Cormode and Muthukrish^ 



nan, 2005 ) or merely finding the most frequent ele- 



ment (Alon et al., 1996) requires Q.{m) bits where m 
is the number of distinct items. 

Various efficient techniques have been proposed 



for the related range MAX problem fChazelle, 1988 



Poon, 2003 ), but they do not necessarily generalize. 



Instead, for the range top-^ problem, we can parti 
tion sparse data cubes into customized data structures 
to speed up queries by an order of magnitude (|Luo 



let al., 2001t |Loh et al., 2QQ2al |Loh et al., 2002b| ). 
We can also answer range top-^ queries using RD- 
trees ( [Chung et al., 2007] ) or R-trees ( [Seokjin et al. 



20051). In tag clouds, precision is not required and ac 



curacy is less important; only the most significant tags 
are typically needed. Further, if all tags have similar 
weights, then any subset of tag may form an accept- 
able tag cloud. 

A strategy to speed up top-^ queries is to 
transform them into comparatively easier iceberg 
queries ( [Carey and Kossmann, 1997] ). For example, 
in computing the top- 10 (k = 10) best vendors, one 
could start by finding all vendors with a rating above 
4/5. If there are at least 10 such vendors, then sort- 
ing this smaller list is enough. If not, one can restart 
the query, seeking vendors with a rating above 3/5. 
Given a histogram or selectivity estimates, we can re- 
duce the number of expected iceberg queries ( [Don-[ 
[jerkovic and Ramakrishnan, 1999| ). Unfortunately, 
this approach is not necessarily applicable to multidi- 
mensional data since even computing iceberg aggre- 
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Figure 5: Example of non informative tag cloud 



gates once for each query may be prohibitive. How- 
ever, iceberg cuboids can still be put to good use. That 
is, one materializes the iceberg of a cuboid, small 
enough to fit in main memory, from which the tag 
clouds are computed. Intuitively, a cuboid represent- 
ing the largest measures is likely to provide reason- 
able tag clouds. Users mostly notice tags with large 
font sizes (Ri vadeneira et al., 2QQ7| ). A good approx- 
imation captures the tags having significantly larger 
weights. To determine whether a tag cloud has such 
significant tags, we can compute the entropy. 

Definition 2 (Entropy of a tag cloud) Let T e ^ be 

a tag from a tag cloud *r, then entropy {^) = 
-LreTPinogipiT)) ^herep{T) = ^^^^j^y 

The entropy quantifies the disparity of weights be- 
tween tags. The lower the entropy, the more interest- 
ing the corresponding tag cloud is. Indeed, tag clouds 
with uniform tag weights have maximal entropy and 
are visually not very informative (see Figure [5]). 

We can measure the quality of a low-entropy tag 
cloud by measuring false positives and negatives: 
false positive happens when a tag has been falsely 
added to a tag cloud whereas a false negative occurs 
when a tag is missing. These measures of error as- 
sume that we limit the number of tags to a moderately 
small number. We use the following quality indexes; 
index values are in [0, 1] and a value of is ideal; they 
are not applicable to high-entropy tag clouds. 

Definition 3 Given approximate and exact tag clouds 
A and E, the false-positive and false -negative indexes 

^ maxt^j^ t^E weightjt) , ^^^teE.t^A weight(t) 



5 TAG-CLOUD DRAWING 

While we can ensure some level of device- 
independent displays on the Web, by using images or 
plugins, text display in HTML may vary substantially 
from browser to another. There is no common set of 
font browsers are required to support, and Web stan- 
dards do not dictate line-breaking algorithms or other 
typographical issues. It is not practical to simulate the 
browser on a server. Meanwhile, if we wish to remain 
accessible and to abide by open standards, producing 
HTML and ECMAScript is the favorite option. 



Given tag-cloud data, the tag-cloud drawing prob- 
lem is to optimally display the tags, generally using 
HTML, so that some desirable properties are met, in- 
cluding the following: (1) the screen space usage is 
minimized; (2) when applicable, similar tags are clus- 
tered together. Typically, the width of the tag cloud is 
fixed, but its height can vary. 

For practical reasons, we do not wish for the 
server to send all of the data to the browser, includ- 
ing a possibly large number of similarity measures 
between tags. Hence, some of the tag-cloud drawing 
computations must be server-bound. There are two 
possible architectures. The first scenario is a browser- 
aware approach ( Kaser and Lemire, 2007^ ): given the 
tag-cloud data provided by the server, the browser 
sends back to the server some display- specific data, 
such as the box dimensions of various tags using dif- 
ferent font sizes. The server then sends back an opti- 
mized tag cloud. The second approach is browser- 
oblivious: the server optimizes the display of the 
tag cloud without any knowledge of the browser by 
passing simple display hints. The browser can then 
execute a final and inexpensive display optimiza- 
tion. While browser-oblivious optimization is neces- 
sarily limited, it has reduced latency and it is easily 
cacheable. 

Browser-oblivious optimization can take many 
forms. For example, we could send classes of tags 
and instruct the browser to display them on separate 
lines ( [Hassan-Montero and Herrero-Solana, 2QQ6| ). In 
our system, tags are sent to the browser as an or- 
dered list, using the convention that successive tags 
are similar and should appear nearby. Given a simi- 
larity measure w between tags, we want to minimize 
llp,q'^{Pi^)d{p^q) where d{p,q) is a distance func- 
tion between the two tags in the list and the sum is 
over all tags. Ideally, d{p^q) should be the physi- 
cal distance between the tags as they appear in the 
browser; we model this distance with the index dis- 
tance: if tag a appears at index / in the list and 
tag b appears at index j, their distance is the inte- 
ger | / — j|. This optimization problem is an instance 
of the NP-complete minimum linear arrange- 
ment (MLA) problem: an optimal linear arrange- 
ment of a graph G = (y,£^), is a map / from V onto 
{1,2, ... ,A^} minimizing I^^^y \f{u)-f{v)\. 

Proposition 1 The browser-oblivious tag-cloud opti- 
mization problem is NP -Complete. 



There is an 0( v'log n log log n)-approximation for 



the MLA problem (Feige and Lee, 2007) in some 
instances. However, for our generic purposes, the 
greedy NEAREST NEIGHBOR (NN) algorithm might 
suffice: insert any tag in an empty list, then repeat- 
edly append a tag most similar to the latest tag in 



Original data 

iceberg • 

















































5 6 7 8 

# of dimensions 



Figure 6: Computing tag clouds from original data vs. ice- 
bergs: iceberg limit value set at 150 and tag-cloud size is 9 
(US Income 2000). 



the list, until all tags have been inserted. It runs in 
0(n^) time where n is the number of tags. Another 
heuristic for the MLA problem is the PAIRWISE EX- 
CHANGE Monte Carlo (PWMC) method ( |Bhasker| 



and Sahni, 1987| ): after applying NN, you repeatedly 
consider the exchange of two tags chosen at random, 
permuting them if it reduces the MLA cost. Another 
Monte Carlo (MC) heuristic begins with the appH- 
cation of NN ([Johnson et al., 2004 ): cut the Hst into 
two blocks at a random location, test if exchanging 
the two blocks reduces the MLA cost, if so proceed; 
repeat. 

Additional display hints can be inserted in this list. 
For example, if two tags must absolutely be very close 
to each other, a glued token could be inserted. Also, 
if two tags can be permuted freely in the list, then a 
PERMUTABLE token could be inserted: the Hst could 
take the form of a PQ tree ( [Booth and Lueker, 1976| ). 



6 EXPERIMENTS 

Throughout these experiments, we used the Java ver- 
sion 1.6.0_02 from Sun Microsystems Inc. on an Ap- 
ple MacPro machine with 2 Dual-Core Intel Xeon 
processors running at 2.66 GHz and 2 GiB of RAM. 

6.1 Iceberg-Based Computation 

To validate the generation of tag clouds from ice- 
bergs, we have run tests over the US Income 2000 
data set ( Hettich and Bay, 2000| ) (42 dimensions 
and about 2 x 10^ facts) as well as a synthetic 
data set (18 dimensions and 2 x 10"^ facts) provided 



by Swivel (http://www.swivel.com/data_sets/ 
[show/1002247 ). Figure [6] shows that while some tag- 
cloud computations require several minutes, iceberg- 
based computations can be much faster. 



From each data set, we generated a 4-dimensional 
data cube. We used the COUNT function to aggre- 
gate data. Tag clouds were computed from each data 
cube using the iceberg approximation with different 
values of limit: the number of facts retained. We also 
implemented exact computations using temporary ta- 
bles. We specified different values for tag-cloud size, 
limiting the maximum number of tags. For each ice- 
berg limit value and tag-cloud size, we computed the 
entropy of the tag cloud, the false-positive and false- 
negative indexes, and processing time for both of ice- 
berg approximation and exact computation. 

We plotted in Figure[7]the false-positive and false- 
negative indexes as a function of the relative en- 
tropy (entropy /log (tag-cloud size)) using various ice- 
berg limit values (150, 600, 1200, 4800, and 19600) 
and various tag-cloud sizes (50, 100, 150, and 200), 
for a total of 20 tag clouds per dimension. The Y axis 
is in a logarithmic scale. Points having their in- 
dexes equal to zero are not displayed. As discussed 
in Section [4j false-positive and false-negative indexes 
should be low when the entropy is low. We verify 
that for low-entropy values (< ^ log (tag-cloud size)), 
the indexes are always close to zero which indicates 
a good approximation. Meanwhile, small iceberg 
cuboids can be processed much faster. 

6.2 Similarity Computation 

Using our two data sets, we tested the NN, PWMC, 
and MC heuristics using both the cosine and the Tan- 
imoto similarity measures. From data cubes made 
of all available dimensions, we used all possible 1- 
tag clouds, using successively all other dimensions as 
clustering dimension for a total of2x(18xl7 + 42x 
41) = 4056 layout optimizations. The iceberg limit 
value was set at 150. The MC heuristic never fared 
better than NN, even when considering a very large 
number of random block permutations: we rejected 
this heuristic as ineffective. However, as Figure [8] 
shows, the PWMC heuristic can sometimes signifi- 
cantly outperform NN when a large number (1000) of 
tag exchanges are considered, but it only outperforms 
NN by more than 20% in less than 5% of all layout op- 
timizations. Meanwhile, PWMC can be several order 
of magnitudes slower than NN: NN is 10 times faster 
than PWMC with 100 exchanges and 70 times faster 
than PWMC with 1000 exchanges. Computing the 
similarity function over an iceberg cuboid was mod- 
erately expensive (0.07 s) for a small iceberg cuboid 
(limit set to 150 cells): the exact computation of the 
similarity function can dwarf the cost of the heuristics 
(NN and PWMC) over a moderately large data set. 
Informal tests suggest that NN computed over a small 
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Figure 7: False-negative and false-positive indexes (0 is best, 1 is worst), values under 0.0001 are not included 
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Figure 8: MLA costs for two examples: the PWMC heuristic was applied using 10, 100 and 1000 random exchanges. 



iceberg cuboid provides significant visual layouts. 



7 CONCLUSION 



has a linear flow such as time or latitude. A more ap- 
propriate approach is to allow the use of a slider ( [Rus- 
[sell, 2006 ) tying several tag clouds, each one corre- 
sponding to a given attribute value. 



According to our experimental results, precomputing 
a single iceberg cuboid per data cube allows to gen- 
erate adequate approximate tag clouds online. Com- 
bined with modern Web technologies such as AJAX 
and JSON, it provides a responsive application. How- 
ever, we plan to make more precise the relationship 
between iceberg cubes, entropy, dimension sizes, and 
our quality indexes. Yet another approach to com- 
pute tag clouds quickly may be to use a bitmap in- 
dex dO'Neil and Quass, 1997] ). While we built a 
Web 2.0 with support for numerous collaborations 
features such as permalinks, tag-cloud embeddings 
with iframe elements, we still need to experiment 
with live users. Our approach to multidimensional tag 
clouds has been to rely on ^-tags. However, this ap- 
proach might not be appropriate when a dimension 
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